-
-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/zfs: mitigate data loss issues when resuming from hibernate #208037
Conversation
Nice! I also looked into this problem about half a year ago (after realizing that it was probably what corrupted my root pool on a good SSD another half year earlier), and I am pretty sure your solution should fix the issue. I solved it (in my host config, not in nixpkgs) by duplicating the resume logic as an early Your solution is definitely more elegant. It should probably be considered (and documented as) a potentially breaking change. Both the stage 1 script and ZFS's integration are pretty (legacy-) messy, and this does probably not improve that situation. This should probably come together with removing / defaulting to true
|
@NiklasGollenstede thanks for your comments 🙂
How would this work? (In your answer, you can assume that I know next to nothing about nixos's test architecture, which would be true!) AFAIK, the data corruption bug (which I have never experienced myself, knock on wood) is something that happens only rarely, even when the conditions that trigger it are met (import a ZFS pool early in the boot process, then resume into an image where that pool is already imported). I can understand writing tests that cause the import-then-resume situation to occur. But I don't predict that such a test will reliably be broken without these fixes. We might need to run the test dozens of times to observe a failure.
What use cases would it break? The only one I can think of would be resuming from swap on a zvol. But every guide to swap/hibernation/zfs that I have read warns against that in pretty strong terms (examples from arch wiki; nixos wiki). So while this change is technically breaking for that use case, I'd welcome any opinions about whether it's important to call out in a changelog a breaking change to a workflow that's already explicitly flagged as broken/not working. (Or maybe you have case[s] in mind other than resume-from-zvol-swap that would be broken by this change?)
Yes, that would seem to make sense 👍 I will refrain from adding it to this PR for now, but if a consensus emerges around merging something like this change then I will change the setting to true in the next round of changes. |
For example, I (and I know of quite a number of other people who say to do the same) clear my temporary filesystems, including boot.initrd.postDeviceCommands = lib.mkAfter ''
( zfs list -H -o name -t snapshot -r ${cfg.temp.zfs.dataset} | grep '@empty$' | xargs -n1 --no-run-if-empty zfs rollback -r )
''; At best, that line would become no-op. Shouldn't completely make my boot fail, but it would not behave as expected (I should probably make if fail loudly). Generally, any
Excellent question! I have never worked with NixOS tests either. After looking at examples and documentation a bit, they generally look quite well designed and intuitive to me, but I am still unclear on how one best does a customized installation. SO I decided to just work with my installer framework first and worry about integration into NixOS test later. But ...
It appears you are right about that: I got it to create a permanent corruption once, but even using the exact same script to repeat the installation and triggering actions in the running VM, I couldn't repeat it. Guess no tests (or someone will have to spend more time on finding a good test case). Footnotes
|
I think this should be merged and backported, it may break a few peoples boot processes but the other option is to not backport it and risk data-loss. And even then the worse that can happen is that the computer doesn't boot, better than "hey your zpool is now dead, better format it" |
As far as ZFS is concerned, probably yes. But there may be other concerns about adding another
Personally, I'm firmly against backporting this. This PR is breaking, and we'd have no way to communicate it (no release notes) or proper way for people to stall the update (not upgrading the release channel).
There already is |
14285f6
to
727eac8
Compare
I think will be nice to bring ZFS and Stage 1 maintainers into this to discuss, pinging here or in Matrix, so the issue does not stale. I remembered this issue because of https://www.reddit.com/r/NixOS/comments/120f4q7/nixos_didnt_start_anymore_zfs_error/ |
I really do hope ZFS gets this every-time I Suspend to RAM / Hibernate the ZFS pools gets marked as Corrupted and have no choice but to restart the PC as it is forced into offline mode |
I am not using legacy mountpoints and do not have such problems because zfs mounts the partitions to the right places by itself and not the script. |
Two interesting points!
@MasterCATZ: Does it really happen every time? Then maybe we could craft a reproducible test case from your setup.
Also, if restarting restores the corruption, it sounds like it's not on-disk. Did you ever try scrubbing instead? (Though that _may_ persist the corruption ...)
@sandro: I would think the timing of the import and mount vs the resume is what matters, not so much how/why they are mounted (but maybe I'm wrong). With your non-legacy mountpoints, do you let ZFS mount on import, or do you use `canmount=noauto` and `fileSystems.*.options = [ "zfsutil" ]` (to still have them mounted via fstab / systemd)?
In either case it would be interesting to know whether `/`, `/nix/store` or any other (implicitly) `neededForBoot` locations are ZFS mounts.
Apr 8, 2023 15:52:59 Sandro ***@***.***>:
…
every-time I Suspend to RAM / Hibernate the ZFS pools gets marked as Corrupted and have no choice but to restart the PC as it is forced into offline mode
I am not using legacy mountpoints and do not have such problems because zfs mounts the partitions to the right places by itself and not the script.
—
Reply to this email directly, view it on GitHub[#208037 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ACQQFXZ7DPZVAG5Z3N66PQLXAGCVVANCNFSM6AAAAAATKY63OQ].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/ACQQFX7EES6IULZTXBWRY7LXAGCVVA5CNFSM6AAAAAATKY63OSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSZOY3K4.gif]
|
I use this and which also generates fstab entries. |
I believe this is the right approach. But I do think hibernation should still be disabled by default with zfs. While the zfs devs believe the issues are resolved as long as you don't import before resuming, this is very untested and they all seem to agree that it should be used with caution. We also need a fix for systemd stage 1, though as of very recent systemd versions that should be as simple as adding |
After some conversation with some more knowledgeable than I: ZFS should probably not be considered safe to resume after hibernation, even if initrd handles it carefully. From a comment on a openzfs/zfs issue:
So ZIOs can occur and end up forgotten in the interim between that system-wide sync and the actual freeze, meaning the hibernation image that you would resume from is essentially from the past compared to the data physically on the disks. All this is to say: |
Given the situation, I guess, we should be able to proceed with this change to improve the reliability for people who decide to opt-in, but we should probably show a warning or something. I don't have any strong opinion on that. |
This is a reason for me, to stop all further to-linux-migration now - a precondition was snap-shot-able filesystem. A BIG PITA. |
Apologies for that, but I don't advise to go for ZFS for such things. It's mostly a server filesystem, not really a laptop/desktop one. Consider Btrfs or XFS, or bcachefs if you are not afraid of experimental. |
I'm happy with this being merged as long as
There's no "may" about it. It is unsafe, even with this change, and no matter the setup. And the risk is not to any one pool; it's to all imported pools. |
Until my disasters with Linux, I understood it as the heart of stability and for my day work, experimental is never an option - for such things, I use separate hardware. |
@mabra This thread is not a place to debate Linux storage technologies. The recommendation remains, not just from myself, but from upstream ZFS devs, that you should not use hibernation while ZFS is in use. |
Thanks! |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/how-good-is-zfs-on-root-on-nixos/40512/29 |
I was reminded of this today. I'd like to merge the ZFS improvement (even though it only makes hibernation a little more safe), but there are conflicts because |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/hibernation-zfs-root-seperate-swap-safe/51711/2 |
5e30c63
to
345618d
Compare
@ElvishJerricco it's now become a one-line change, is that all that's needed? |
Should be, yep. |
@ofborg test zfs |
Tests failing for unrelated reasons. Going to fix the tests locally, build, and then merge this. Then open a PR with fixed tests |
sorry I never noticed this message , lately I have not noticed ZFS complaining when resuming , my ZFS pools spans across multiple disk-shelves using multi-path devices using 3 way mirrors if their is still a need to test I am happy to not export and give suspend a few loops actually I am spinning up my steam library shelf now , does not matter if that gets nuked Pre suspend
I can confirm it still happens but only to the drives that have been spun down that are internally in the PC I did a suspend test with ZFS15k that has my 15,000 rpm drives that store steam Libary during a game update how ever ZFS3WAY which is what I store my documents on that are internally connected to the PC's power supply was forced offline and faulted and this was not even having data being written / read to it happy to apply any patches / do zfs update build log file I can do another test and power off the disk-shelf to see how that affects the resume , edit so suspending then powering off the shelves , then powering them on and waiting ~ 5 mins for al the drives to spin up then activating pc ZFS is ok suspending turning off the shelf then turning it back on and activating pc before they have all stabilized zfs fails so zfs is wanting to use them before they have actually re-started back up |
@MasterCATZ I hope this being merged didn't give you the impression that hibernation was ok with ZFS. We explicitly left it disabled by default because it's not. Regular suspend should be ok, but hibernation is not. |
This change may break a lot of folks using some form of impermanence with ZFS as often that is done via e.g. At a minimum, I think we need a release notes entry for this as it’s a breaking change for anyone whose config expects the pool to be imported in |
Description of changes
There are occasional data corruption issues when resuming from hibernate on ZFS. The upstream issue is here: openzfs/zfs#260. The nixpkgs attempt to address this is here: #171680.
Particularly relevant is this comment by @bryanasdev000:
The nixos stage1 boot script is prone to the second failure case, because it imports ZFS pools in the postDeviceCommands step, which is run before attempting to resume from suspend. @infinisil made a patch to the stage1 script here which moves postDeviceCommands to after the attempt to resume: infinisil@448fe5d
This certainly fixes the problem, but I'm unsure of how general of a fix it is. postDeviceCommands is used (afaics) to load some other filesystem containers (for lack of a better word...) such as luks and btrfs. It's valid to resume from a swap file in one of those containers. So, I think the correct fix is to move only the ZFS imports to after the resume step. That's what this PR accomplishes, by introducing a new postResumeCommands step that ZFS can hook into.
This is my first nixpkgs PR and I hope these changes will be more or less suitable to incorporate as is -- but if there's further exploration or discussion that's needed then I will be happy with that outcome too! 🙂
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes